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Abstract 

New developments are presented in the framework of the model introduced by the 
authors in refs. [1, 2] and in which nucleotides as well as codons are classified in crystal 
bases of the quantum group U q (sl(2) © sl(2)) in the limit q — > 0. An operator which 
gives the correspondence between the amino-acids and the codons is now obtained for any 
known genetic code. The free energy released by base pairing of dinucleotides as well as 
the relative hydrophilicity and hydrophobicity of the dinucleosides are also computed. For 
the vertebrate series, a universal behaviour in the ratios of codon usage frequencies is put 
in evidence and is shown to fit nicely in our model. Then a first attempt to represent the 
mutations relative to the deletion of a pyrimidine by action of a suitable crystal spinor 
operator is proposed. Finally recent theoretical descriptions are reviewed and compared 
with our model. 
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1 Introduction 



Among the numerous and important questions offered to the physicist by the sciences of life, 
the ones relative to the genetic code present a particular interest. Indeed, in addition to 
the fundamental importance of this domain, the DNA structure on the one hand and the 
mechanism of polypeptid fixation from codons on the other hand possess appealing aspects 
for the theorist. Let us, in a brief summary, select some essential features [3]. First, as well 
known, the DNA macromolecule is constituted by two linear chains of nucleotides in a double 
helix shape. There are four different nucleotides, characterized by their bases: adenine (A) and 
guanine (G) deriving from purine, and cytosine (C) and thymine (T) coming from pyrimidine. 
Note also that an A (resp. T) base in one strand is connected with two hydrogen bonds to a 
T (resp. A) base in the other strand, while a C (resp. G) base is related to a G (resp. C) 
base with three hydrogen bonds. The genetic information is transmitted to the cytoplasm via 
the messenger ribonucleic acid or mRNA. During this operation, called transcription, the A, 
G, C, T bases in the DNA are associated respectively to the U, C, G, A bases, U denoting the 
uracile base. Then it will be through a ribosome that a triplet of nucleotides or codon will be 
related to an amino-acid. More precisely, a codon is defined as an ordered sequence of three 
nucleotides, e.g. AAG, ACG, etc., and one enumerates in this way 4x4x4 = 64 different codons. 
Following the universal eukariotic code (see Table 4), 61 of such triplets can be connected in 
an unambiguous way to the amino-acids, except the three following triplets UAA, UAG and 
UGA, which are called non-sense or stop-codons, the role of which is to stop the biosynthesis. 
Indeed, the genetic code is the association between codons and amino-acids. But since one 
distinguishes only 20 amino-acids 1 related to the 61 codons, it follows that the genetic code is 
degenerated. Still considering the standard eukariotic code, one observes sextets, quadruplets, 
triplets, doublets and singlets of codons, each multiplet corresponding to a specific amino-acid. 
Such a picture naturally suggests to look for an underlying symmetry able to describe the 
observed structure in multiplets, in the spirit of dynamical symmetry scheme which has proven 
so powerful in atomic, molecular and nuclear physics. We review at the end of this paper these 
recent approaches. 

In refs. [1, 2] we have proposed a mathematical framework in which the codons appear 

as composite states of nucleotides. The four nucleotides being assigned to the fundamental 

irreducible representation of the quantum group U q {sl{2) © sl(2)) in the limit q — > 0, the 

codons are obtained as tensor product of nucleotides. Indeed, the properties of quantum group 

representations in the limit q — > 0, or crystal basis, are well adapted to take into account the 

nucleotide ordering. Then properties of this model have been considered. We will generalize 

some of them in the following and also propose new developments. 

1 Alanine (Ala), Argininc (Arg), Asparaginc (Asn), Aspartic acid (Asp), Cysteine (Cys), Glutaminc (Gin), 
Glutamic acid (Glu), Glycine (Gly), Histidine (His), Isoleucine (He), Leucine (Leu), Lysine (Lys), Methionine 
(Met), Phenylalanine (Phe), Proline (Pro), Serine (Ser), Threonine (Thr), Tryptophane (Trp), Tyrosine (Tyr), 
Valine (Val). 
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The paper is organized as follows. We start in sect. 2 by recalling the main aspects of the 
model. In sect. 3 we build out of the generators oiU q ->o(sl(2)(Bsl(2)) a reading operator, which 
gives the correct correspondence between codons and amino-acids for each of the 12 presently 
known genetic codes. This construction generalizes in a synthetical way the one started in [1] 
for the eukariotic and vertebrate mitochondrial codes, the different reading operators acting 
on codons and providing the same eigenvalue for a given amino-acid whatever the considered 
code. In sect. 4 some physical properties of dinucleotide states are fitted. In sect. 5, we analyze 
ratios of codon usage frequency for several biological species belonging to the vertebrate class 
and put in evidence a universal behaviour, which fits naturally in our model. In sect. 6, making 
use of the general crystal basis mathematical framework, we represent the mutation induced 
by the deletion of a pyrimidine by the action of a suitable crystal spinor operator. In sect. 7 
we review and compare with our model the recent symmetry approaches to the genetic code. 
Finally in sect. 8 we give a few conclusions and discuss some directions of future developments. 



2 The Model 

We consider the four nucleotides as basic states of the (|, |) representation of the U q (sl(2) © 
sl(2)) quantum enveloping algebra in the limit q — > 0. A triplet of nucleotides will then be 
obtained by constructing the tensor product of three such four-dimensional representations. 
Actually, this approach mimicks the group theoretical classification of baryons made out from 
three quarks in elementary particles physics, the building blocks being here the A, C, G, T/U 
nucleotides. The main and essential difference stands in the property of a codon to be an 
ordered set of three nucleotides, which is not the case for a baryon. 

Constructing such pure states is made possible in the framework of any algebra U q ^{Q) with 
Q being any (semi)-simple classical Lie algebra owing to the existence of a special basis, called 
crystal basis, in any (finite dimensional) representation of Q. The algebra Q = si (2) © si (2) 
appears the most natural for our purpose. The complementary rule in the DNA-mRNA tran- 
scription may suggest to assign a quantum number with opposite values to the couples (A, T/U) 
and (C,G). The distinction between the purine bases (A,G) and the pyrimidine ones (C,T/U) 
can be algebraically represented in an analogous way. Thus considering the fundamental repre- 
sentation (|, \) of sl(2)®sl(2) and denoting ± the basis vector corresponding to the eigenvalues 
±| of the J 3 generator in any of the two si (2) corresponding algebras, we will assume the fol- 
lowing "biological" spin structure: 

sl(2) H 

C = (+,+) <— U =(-,+) 

sl(2) v I | sl(2) v (1) 

(?=(+,-) — A=(--) 
sl{2) H 
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(2) 



the subscripts H (:= horizontal) and V (:= vertical) being just added to specify the algebra. 

Now, we consider the representations of U q (sl(2)) and more specifically the crystal bases 
obtained when q — > 0. Introducing in U q ^{sl(2)) the operators J + and J_ after modification 
of the corresponding simple root vectors of U q (sl(2)), a particular kind of basis in a U q (sl(2))- 
module can be defined. Such a basis is called a crystal basis and carries the property to undergo 
in a specially simple way the action of the J + and J_ operators: as an example, for any couple 
of vectors u, v in the crystal basis £>, one gets u = J+v if and only if v = J-U. More interesting 
for our purpose is the crystal basis in the tensorial product of two representations. Then the 
following theorem holds [4] (written here in the case of s/(2)): 

Theorem 1 (Kashiwara) Let B\ and £> 2 be the crystal bases of the Mi and M 2 U q ^ (sl(2))- 
modules respectively. Then for u e B\ and v e £> 2 , we have: 

, / x I J-U <S>v 3n > 1 such that J™u ^ and J+v = , n . 
J_(w ® v) = < , (3) 

I -u ® J_f otherwise 

J (u <S> v) — i U ^ ^ U ~ ^ SUC ^ ^ ia ^ ^ ^ ° n< ^ '^~ U = ^ (4) 
+ ^ J + u®v otherwise 

Note that the tensor product of two representations in the crystal basis is not commutative. 
However, in the case of our model, we only need to construct the n-fold tensor product of the 
fundamental representation (|, |) of U q ^(sl{2)® sl{2)) by itself, thus preserving commutativity 
and associativity. 

Let us insist on the choice of the crystal basis, which exists only in the limit q — > 0. In a 
codon the order of the nucleotides is of fundamental importance (e.g. CCU — > Pro, CUC — > 
Leu, UCC — > Ser). If we want to consider the codons as composite states of the (elementary) 
nucleotides, this surely cannot be done in the framework of Lie (super) algebras. Indeed in the 
Lie theory, the composite states are obtained by performing tensor products of the fundamental 
irreducible representations. They appear as linear combinations of the elementary states, with 
symmetry properties determined from the tensor product (i.e. for sl(n), by the structure of 
the corresponding Young tableaux). On the contrary the crystal basis provides us with the 
mathematical structure to build composite states as pure states, characterized by the order of 
the constituents. In order to dispose of such a basis, we need to consider the limit q — > 0. 
Note that in this limit we do not deal anymore either with a Lie algebra or with an universal 
deformed enveloping algebra. 

To represent a codon, we have to perform the tensor product of three (|, |) representations 
of U q ^o{sl{2) © sl{2)). However, it is well-known (see Tables 4) that in a multiplet of codons 
relative to a specific amino-acid, the two first bases constituent of a codon are "relatively 
stable" , the degeneracy being mainly generated by the third nucleotide. We consider first the 
tensor product: 

(U) ® (|, I) = (1,1) © (1,0) © (0,1) © (0,0) (5) 
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where inside the parenthesis, j = 0, |, 1 is put in place of the 2j + 1 = 1,2,3 respectively 
dimensional s/(2) representation. We get, using Theorem 1, the following tableau: 



su(2) H 



i 

su(2) v 




1,0) ( CG UG UA ) 

cc uc uu 

(1,1) [ GC AC AU 
GG AG AA 



From Table 4, the dinucleotide states formed by the first two nucleotides in a codon can be put 
in correspondence with quadruplets, doublets or singlets of codons relative to an amino-acid. 
Note that the sextets (resp. triplets) are viewed as the sum of a quadruplet and a doublet 
(resp. a doublet and a singlet). Let us define the "charge" Q of a dinucleotide state by 



Q - j(l) , 7(2) , j( 2 ) 
V ~~ J H,3 ' J H,3 ' J V,3 



(6) 



where the superscript (1) or (2) denotes the position of a codon in the dinucleotide state. 
The dinucleotide states are then split into two octets with respect to the charge Q: the eight 
strong dinucleotides associated to the quadruplets (as well as those included in the sextets) of 
codons satisfy Q > 0, while the eight weak dinucleotides associated to the doublets (as well as 
those included in the triplets) and eventually to the singlets of codons satisfy Q < 0. Let us 
remark that by the change C <-> A and U <-> G, which is equivalent to the change of the sign 
of J 3i a or to reflexion with respect to the diagonals of the eq.(2), the 8 strong dinucleotides are 
transformed into weak ones and vice-versa. 

If we consider the three- fold tensor product, the content into irreducible representations of 
U q -> (sl(2) © sZ(2)) is given by: 



a, \) ® a, \) ® (i I) = (I, I) ©2(§ i) ©2 (i i) © 4 a, i) 



(7) 



The structure of the irreducible representations of the r.h.s. of Eq. (7) is (the upper labels 
denote different irreducible representations): 



'3 3^ 
>2' 2' 



V2' 2/ — 



'3 1\2 
>2' 2^ 



/ ccc 


ucc 


uuc 


UUU \ 


GCC 


ACC 


AUC 


AUU 


GGC 


AGC 


AAC 


AAU 


\ GGG 


AGG 


AAG 


AAA / 


/ CCG 


UCG 


UUG 


UUA > 


V GCG 


ACG 


AUG 


AUA j 


/ CGC 


UGC 


UAC 


UAU > 


I CGG 


UGG 


UAG 


UAA j 



'I 3\1 
>2» 2/ 



/ ecu ucu \ 

GCU ACU 
GGU AGU 
\ GGA AGA j 



'I 3\2 
-2» 2> 



( cue cuu \ 

GUC GUU 
GAC GAU 
\ GAG GAA j 
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(I 1)1 = 



( CCA UCA \ 
V GCA ACA J 



(I 1)2 = 



( CGU UGU \ 
V CGA UGA J 



(il) 3 -( 



CUG CUA 
GUG GUA 



) 



(U) 4 =( 



CAC CAU 
CAG CAA 



) 



The correspondence with the amino-acids is given in Table 10 (for the eukariotic code). 

Let us close this section by drawing the reader's attention to Fig. 1 where is specified 
for each codon its position in the appropriate representation. The diagram of states for each 
representation is supposed to lie in a separate parallel plane. Thick lines connect codons 
associated to the same amino-acid. One remarks that each segment relates a couple of codons 
belonging to the same representation or to two different representations. This last case occurs 
for quadruplets or sextets of codons associated to the same amino-acid. 

3 The Reading (or Ribosome) operator 1Z 
3.1 General structure of the reading operator 

As expected from formula (7), our model does not gather codons associated to one particular 
amino-acid in the same irreducible multiplet. However, it is possible to construct an operator 
TZ out of the algebra U q ^o(sl(2) © s/(2)), acting on the codons, that will describe the various 
genetic codes in the following way: 

Two codons have the same eigenvalue under TZ if and only if they are associated to the same 
amino-acid. This operator TZ will be called the reading operator. 

It is a remarkable fact that the various genetic codes share the same basic structure. As we 
mentioned above, the dinucleotides can be split into "strong" dinucleotides CC, GC, UC, AC, 
CU, GU, CG and GG that lead to quartets and "weak" ones UU, AU, UG, AG, CA, GA, UA, 
AA that lead to doublets. Let us construct a prototype of the reading operator that reproduces 
this structure. 

The first part of the reading operator TZ is responsible for the structure in quadruplets given 
essentially by the dinucleotide content. It is given by (the q are arbitrary coefficients) 



The operators J a ^ (a = H, V) are the third components of the total spin generators of the 
algebra U q ^o{sl(2)@sl(2)). The operator C a is a Casimir operator of U q ^(sl(2) a ) in the crystal 
basis. It commutes with J a ± and J a $ and its eigenvalues on any vector basis of an irreducible 
representation of highest weight J is J (J + 1), that is the same as the undeformed standard 
second degree Casimir operator of sl{2). Its explicit expression is 



|ci C H + |c 2 Cy - 4Ci V H Jh,3 ~ 4c 2 Vy J Vj3 . 



(8) 



n 



Ca — (Ja,3) 2 + \ k {J a+Y {J a-) k ■ 



(9) 
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Note that for sZ(2) g _ v0 the Casimir operator is an infinite series of powers of J a ±. However 
in any finite irreducible representation only a finite number of terms gives a non-vanishing 
contribution. 

Vn and Vy are projectors given by the following expressions: 

V H = J d H+ J d H _ and V V = J V+ J V _. (10) 

The second part of 1Z gives rise to the splitting of the quadruplets into doublets. It reads 

-2V D c 3 J Vy3 (11) 

where the projector Vd is given by 

Vd = (1~ 4 + 4-)(4 + J d H-)(J d H- JH + ) + (1 - Jh + 4-)(l - 4 + 4-) 

+ (i - 4+ J d H -)(4 + 4-)(4- J d H+ ) • (12) 

The third part of 1Z allows to reproduce the sextets viewed as quartets plus doublets. It is 

-2P 5 c 4 Jy,3 (13) 

where the projector Vs is given by 

v s = (4- J d H+ ) l(J d H+ - 4 + 4-) + (4 + 4-)(4- ■#+)(! - 4 + 4-)] • (i4) 

At this point, one obtains the eigenvalues of the reading operator TZ for the 64 codons, where 
Y = C,U (pyrimidines), R = G,A (purines) and N = C,U,G,A: 



CCN = 


-d - c 2 


GCN = 


-ci + 3c 2 


UCN = 


3ci - c 2 


ACN = 


3ci + 3c 2 


CUN = 


Cl - c 2 


GUN = 


ci + 3c 2 


CGN = 


-Cl + c 2 


GGN = 


-ci + 5c 2 


UUY = 


5ci — c 2 — 3c3 


UUR = 


5ci - c 2 - c 3 


AUY = 


5ci + 3c 2 — c 3 — c 4 


AUR = 


5ci + 3c 2 + c 3 + c 4 


UGY = 


3ci + c 2 — c 3 — c 4 


UGR = 


3ci + c 2 + c 3 + c 4 


AGY = 


3ci + 5c 2 + c 3 + c 4 


AGR = 


3ci + 5c 2 + 3c 3 + 3c 4 


CAY = 


Ci + c 2 - c 3 


CAR = 


Ci + c 2 + c 3 


GAY = 


ci + 5c 2 + c 3 


GAR = 


ci + 5c 2 + 3c 3 


UAY = 


5ci + c 2 - c 3 


UAR = 


5ci + c 2 + c 3 


AAY = 


5ci + 5c 2 + c 3 


AAR = 


5ci + 5c 2 + 3c 3 



(15) 



The coefficients c 3 and c 4 are fixed as follows. The coefficient c 3 is set to the value c 3 = 4ci by 
requiring that the quartet CUN and the doublet UUR, associated to the amino-acid Leu, lead 
to the same ^-eigenvalue. It remains to reproduce the Ser sextet. This is achieved by taking 
for the coefficient c 4 the value c 4 = — 4ci — 6c 2 , such that the final eigenvalues for the codons 
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are the following: 

CCN = -ci - c 2 GCN = -ci + 3c 2 UCN = 3ci - c 2 ACN = 3ci + 3c 2 

CUN = ci - c 2 GUN = ci + 3c 2 CGN = c x + c 2 GGN = - Cl + 5c 2 

UUY = -7ci - c 2 UUR = ci - c 2 AUY = 5ci + 9c 2 AUR = 5ci - 3c 2 

UGY = 3ci + 7c 2 UGR = 3ci - 5c 2 AGY = 3ci - c 2 AGR = 3ci - 13c 2 

CAY = -3ci + c 2 CAR = 5ci + c 2 GAY = 5ci + 5c 2 GAR = 13ci + 5c 2 

UAY = ci + c 2 UAR = 9ci + c 2 AAY = 9ci + 5c 2 AAR = 17ci + 5c 2 

(16) 

The prototype of the reading operator 72. takes finally the form: 

U = f ci C H + |c 2 Cy - 4ci P H ^,3 - 4c 2 V v Jv,3 + (-8ci 7^ + (8ci + 12c 2 ) P 5 ) Jy,3 (17) 

and the correspondence codons/amino- acids is given as follows: 

CCN -> Pro UCN -> Ser GCN -> Ala ACN -> Thr 

CUN -> Leu GUN -> Val CGN -> Arg GGN -> Gly 

UUY^Phe AUY -> He UGY — > Cys AGY -> Ser 

UUR -> Leu AUR -> Met UGR -> Trp AGR -> unassigned (X) ! ^ 

CAY -> His UAY -> Tyr GAY -> Gin AAY -> Asn 

CAR -> Gin UAR -> Ter GAR -> Glu AAR -> Lys 

3.2 The various genetic codes 

In this section, we will determine the reading operators for the following genetic codes: 

- the Eukariotic Code (EC), 

- the Vertebral Mitochondrial Code (VMC), 

- the Yeast Mitochondrial Code (YMC), 

- the Invertebrate Mitochondrial Code (IMC), 

- the Protozoan Mitochondrial and Mycoplasma Code (PMC), 

- the Echinoderm Mitochondrial Code (EMC), 

- the Ascidian Mitochondrial Code (AMC), 

- the Flatworm Mitochondrial Code (FMC), 

- the Ciliate Nuclear Code (CNC), 

- the Blepharisma Nuclear Code (BNC), 

- the Euplotid Nuclear Code (ENC), 

- the Alternative Yeast Nuclear Code (alt. YNC), 

Let us emphasize that each of these codes is very close to the assignment (18). The main 
differences between the biological codes and the prototype code (18) are the following: 

• assignment of the doublet AGR either to Arg (codes EC, YMC, PMC, CNC, BNC, ENC, 
aYNC), Ser (codes IMC, EMC, FMC), Gly (code AMC) or the stop signal Ter (code 
VMC). 

Such an assignment is done by the following term in the reading operator: 



c 5 V AG (| - J®) (19) 
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The operators J^ 3 are the third components corresponding to the third nucleotide of a 
codon. Of course, these last two operators can be replaced by J^ 3 = J aj3 — J^ 3 . 
The projector Vag is given by 

V AG = (J d H+ J d H -){J d H- " 4 + 4-)(4- 4 + ) (20) 

and the coefficient c 5 by 

for Arg c 5 = — 4ci + 14c 2 

for Ser c 5 = 12c 2 , , 

for Gly c 5 = -4ci + 18c 2 1 j 

for Ter C5 = 6ci + 14c 2 

splitting of some doublets into singlets (one element of the singlet combining to another 
doublet to form a triplet): 

Met -> Met + He for the EC, PMC, EMC, FMC, CNC, BNC, ENC, aYNC codes; 

Lys -> Lys + Asn for the FMC and EMC codes; 

Trp -> Trp + Ter for the EC, CNC, BNC, aYNC codes; 

Trp — > Trp + Cys for the ENC code; 

Ter — > Tyr + Ter for the FMC code; 
Such an assignment is done through the following term in the reading operator: 

c 6 V XY (§ - J®) (§ - J§1) (22) 

where we use the projector Vau for the splitting of the Met doublet, Vaa for the Lys 
doublet, Vug fc> r the Trp doublet, and Vua for the Ter doublet. These projectors are 
given by 

Vau = (1 - 4+ J d H-){J d H - 4+X4+ 4-)(4- 4 + ) (23) 
Vaa = (1 - 4+ 4-X4- - 4 + 4 + ) (24) 

Vug = (4+ 4-)(4- 4 + )(i - 4 + 4-)(i - 4- 4 + ) (25) 
^ = (1 - j* + 4-)(4- - 4+ 4-)(i - 4- 4 + ) (26) 

The coefficient cq takes the following values: 

for Met -> Met + He c 6 = 12c 2 

for Lys — >■ Lys + Asn c 6 = — 8ci 

for Trp -> Trp + Cys c 6 = 12c 2 (27) 

for Trp — > Trp + Ter c 6 = 6ci + 6c 2 

for Ter — >■ Ter + Tyr cq = — 8ci 

in the case of the CNC and BNC codes, the Ter doublet is changed in Gin as follows: 
Ter — > Gin for the CNC code by the term 

- 4 Cl Pcm (i - Jg) (28) 

Ter — >■ Ter + Gin for the BNC code by the term 

- 4 Cl V UA (| - J®) (| + J® ) (29) 
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in the case of the alternative YNC code, the last quartet Leu is split into a triplet Leu 
coded by (CUC,CUU,CUA) and a doublet Ser coded by (CUG). The corresponding term 
in the reading operator is 



2 C1 P OT (I-J«?»)(i + Jg>) 
where the projector Vcu is given by 

Vcu = (1 - 4+ - 4- 4+)(4 + ■#-)(! - 4- 4+) 



(30) 



(31) 



• in the case of the Yeast Mitochondrial Code, the quartet CUN codes the amino-acid 
Thr rather than Leu. This change is achieved by multiplying the quartets term (8) by 
(1 + 2Vcu) for the horizontal part and by (1 — 4Vcu) for the vertical part. 

3.2.1 The Eukariotic Code (EC) 

The Eukariotic Code is the most important one and is often referred to as the universal code. 
The differences between the Eukariotic Code and the prototype code are the following: 





prototype code 


EC 




prototype code 


EC 


AUG 


Met 


Met 


AUA 


Met 


He 


AGG 


X 


Arg 


AGA 


X 


Arg 


UGG 


Trp 


Trp 


UGA 


Trp 


Ter 



Hence from (19), (21), (22) and (27), the reading operator for the Eukariotic Code is 



jci C H + |c 2 C v - 4ci V H J h , 3 - 4c 2 V v J v , 3 + (-8ci V D + (8c, + 12c 2 ) V s ) J v , 3 



+(-4ci + 14c 2 ) V AG (| - jg) 

+ [l2ca Vau + (6ci + 6c 2 ) Vug] (| - 41) (l - 4 3 ; 

3.2.2 The Vertebral Mitochondrial Code (VMC) 



(32) 



The Vertebral Mitochondrial Code is used in the mitochondriae of vertebrata. The differences 
between the Vertebral Mitochondrial Code and the prototype code are the following: 



prototype code VMC 


prototype code VMC 


AGG X Ter 


AGA X Ter 



Hence from (19) and (21), the reading operator for the Vertebral Mitochondrial Code is 

n VMC = |ci C H + f 02 C v - 4ci V H Jh,3 - 4:C 2 V v Jv,3 + (-8ci V D + (8ci + 12c 2 ) P s ) ^,3 

(33) 



+ 



(6 Cl + 14c 2 ) P AG (| - Jg) 
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3.2.3 The Yeast Mitochondrial Code (YMC) 

The Yeast Mitochondrial Code is used in the mitochondriae of yeast such as Saccharomyces, 
Candida, etc. The differences between the Yeast Mitochondrial Code and the prototype code 
are the following: 





prototype code 


YMC 




prototype code 


YMC 


cue 


Leu 


Thr 


cuu 


Leu 


Thr 


CUG 


Leu 


Thr 


CUA 


Leu 


Thr 


AGG 


X 


Arg 


AGA 


X 


Arg 



Hence from (19) and (21), the reading operator for the Yeast Mitochondrial Code is 

n YMC = (ic 1 C H -ic 1 V H JH,3)(l + 2Vcu) + (ic 2 C v -ic 2 VvJv,3)(i-4:Vcu) 

+(-8ci V D + (8ci + 12c 2 ) Vs) Jv,3 + (-4ci + 14c 2 ) V AG (§ - 45) ( 34 ) 

3.2.4 The Invertebrate Mitochondrial Code (IMC) 

The Invertebrate Mitochondrial Code is used in the mitochondriae of some arthopoda, mollusca, 
nematoda and insecta. The differences between the Invertebrate Mitochondrial Code and the 
prototype code are the following: 



prototype code IMC 


prototype code IMC 


AGG X Ser 


AGA X Ser 



Hence from (19) and (21), the reading operator for the Invertebrate Mitochondrial Code is 

K IMC = |ci C H + |c 2 C v - 4ci V H J Hy3 - 4c 2 V v J v , 3 + (-8ci V D + (8ci + 12c 2 ) V s ) J V ,s 
+12c 2 Tag (| - 4?) (35) 

3.2.5 The Protozoan Mitochondrial and Mycoplasma Code (PMC) 

The Protozoan Mitochondrial and Mycoplasma Code is used in the mitochondriae of some 
protozoa (leishmania, paramecia, trypanosoma, etc.) and for many fungi. The differences 
between the Protozoan Mitochondrial and Mycoplasma Code and the prototype code are the 
following: 





prototype code 


PMC 




prototype code 


PMC 


AUG 


Met 


Met 


AUA 


Met 


lie 


AGG 


X 


Arg 


AGA 


X 


Arg 



Hence from (19), (21), (22) and (27), the reading operator for the Protozoan Mitochondrial 
Code is 

n PMC = f ex C H + |c 2 Cy - 4ci V H J h , 3 - 4c 2 V v J v , 3 + (-8ci V D + (8ci + 12c 2 ) V s ) J v ,3 
+(-4d + 14c 2 ) V AG [\ - jg) + 12c 2 V AU (| - J$) (| - jgi) (36) 
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3.2.6 The Echinoderm Mitochondrial Code (EMC) 



The Echinoderm Mitochondrial Code is used in the mitochondriae of some asterozoa and echi- 
nozoa. The differences between the Echinoderm Mitochondrial Code and the prototype code 
are the following: 





prototype code 


EMC 




prototype code 


EMC 


AUG 


Met 


Met 


AUA 


Met 


lie 


AGG 


X 


Ser 


AGA 


X 


Ser 


AAG 


Lys 


Lys 


AAA 


Lys 


Asn 



Hence from (19), (21), (22) and (27), the reading operator for the Echinoderm Mitochondrial 
Code is 

n EMC = fci C H + |c 2 C v - 4ci V H Jh,3 - 4:c 2 V v J V ,3 + (-8ci V D + (8c, + 12c 2 ) V s ) Jv,s 
+12c 2 V AG (§ - jg) + [l2c 2 V AU - 8c, V AA \ (j - jg) (§ - j| } 3 ) (37) 

3.2.7 The Ascidian Mitochondrial Code (AMC) 

The Ascidian Mitochondrial Code is used in the mitochondriae of some ascidiacea. The differ- 
ences between the Ascidian Mitochondrial Code and the prototype code are the following: 



prototype code AMC 



AGG 



X 



Gly 



prototype code AMC 



AGA 



X 



Gly 



Hence from (19) and (21), the reading operator for the Ascidian Mitochondrial Code is 



|ci C H + |c 2 C v - 4ci V H J h ,3 - 4c 2 V v J v , 3 + (-8c, V D + (8c, + 12c 2 ) V s ) J V ,s 
+(-4ci + 18c 2 ) Tag (\ ~ J$) (38) 

3.2.8 The Flatworm Mitochondrial Code (FMC) 

The Flatworm Mitochondrial Code is used in the mitochondriae of the flatworms. The differ- 
ences between the Flatworm Mitochondrial Code and the prototype code are the following: 





prototype code 


FMC 




prototype code 


FMC 


UAG 


Ter 


Ter 


UAA 


Ter 


Tyr 


AUG 


Met 


Met 


AUA 


Met 


He 


AGG 


X 


Ser 


AGA 


X 


Ser 


AAG 


Lys 


Lys 


AAA 


Lys 


Asn 



Hence from (19), (21), (22) and (27), the reading operator for the Flatworm Mitochondrial 
Code is 

n FMC = f ci C H + |c 2 C v - Ac, V H Jh,3 - 4c 2 V v Jv,3 + (~8c, V D + (8c, + 12c 2 ) V s ) Jv,3 
+ 12c 2 V AG (| - Jg) + [l2ca Vau ~ 8c, V AA - 8c, V UA ] (| - J^) (| - J^) 

(39) 
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3.2.9 The Ciliate Nuclear Code (CNC) 



The Ciliate Nuclear Code is used in the nuclei of some ciliata, dasyclasaceae and diplomonadida. 
The differences between the Ciliate Nuclear Code and the prototype code are the following: 





prototype code 


CNC 




prototype code 


CNC 


UGG 


Trp 


Trp 


UGA 


Trp 


Ter 


UAG 


Ter 


Gin 


UAA 


Ter 


Gin 


AUG 


Met 


Met 


AUA 


Met 


He 


AGG 


X 


Arg 


AGA 


X 


Arg 



Hence from (19), (21), (22), (27) and (28), the reading operator for the Ciliate Nuclear Code is 
n CNC = fci C H + §c 2 Cy - 4ci V H J h , 3 - 4c 2 V V Jy, 3 + (-8ci V D + (8ci + 12c 2 ) V s ) Jv,3 



+ 
+ 



(-Ac, + Uc 2 ) V AG - Ac, V UA ] (| - ^s) 

12c 2 V AU + (6 Cl + 6c 2 ) V UG ] (± - ./g) (j - J§ty 



(40) 



3.2.10 The Blepharisma Nuclear Code (BNC) 



The Blepharisma Nuclear Code is used in the nuclei of the blepharisma (ciliata) (note that 
this code is very close to the CNC which is used for the ciliata). The differences between the 
Blepharisma Nuclear Code and the prototype code are the following: 





prototype code 


BNC 




prototype code 


BNC 


UGG 


Trp 


Trp 


UGA 


Trp 


Ter 


UAG 


Ter 


Gin 


UAA 


Ter 


Ter 


AUG 


Met 


Met 


AUA 


Met 


He 


AGG 


X 


Arg 


AGA 


X 


Arg 



Hence from (19), (21), (22), (27) and (29), the reading operator for the Blepharisma Nuclear 
Code is 



K 



BNC 



§ Cl C H + |c 2 C v - Ac x V H J Hy3 - 4c 2 V v J Vy3 + (-8ci V D + (8ci + 12c 2 ) V s ) Jv,s 
+(-4d + 14c 2 ) V AG (| - jg) - 4 Cl V UA (| - J^) (| + J®) 

12c 2 Vau + (6ci + 6c 2 ) Pc/g] (| - jg) (| - 



+ 



(41) 



3.2.11 The Euplotid Nuclear Code (ENC) 



The Euplotid Nuclear Code is used in the nuclei of the euplotidae (ciliata). The differences 
between the Euplotid Nuclear Code and the prototype code are the following: 





prototype code 


ENC 




prototype code 


ENC 


UGG 


Trp 


Trp 


UGA 


Trp 


Cys 


AUG 


Met 


Met 


AUA 


Met 


He 


AGG 


X 


Arg 


AGA 


X 


Arg 



Hence from (19), (21), (22) and (27), the reading operator for the Euplotid Nuclear Code is 
n ENC = f ci C H + |c 2 C v - 4ci V H Jh,3 - 4:c 2 V v J V ,3 + (-8ci V D + (8ci + 12c 2 ) V s ) Jv,3 
+(-4 Cl + 14c 2 ) Tag (| - 4?) + 12c 2 + ^) (| - 45) (l " (42) 
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3.2.12 The alternative Yeast Nuclear Code (alt. YNC) 

The alternative Yeast Nuclear Code is used in the nuclei of some yeast (essentially many 
candidae). The differences between the alternative Yeast Nuclear Code and the prototype code 
are the following: 





prototype code 


alt. YNC 




prototype code 


alt. YNC 


CUG 


Leu 


Ser 


CUA 


Leu 


Leu 


UGG 


Trp 


Trp 


UGA 


Trp 


Ter 


AUG 


Met 


Met 


AUA 


Met 


lie 


AGG 


X 


Arg 


AGA 


X 


Arg 



Hence from (19), (21), (22), (27) and (30), the reading operator for the alternative Yeast Nuclear 
Code is 



n, 



aYNC 



fci C H + f c 2 C v - 4ci V H Jh,s - 4c 2 V v J V ,3 + (-8ci V D + (8ci + 12c 2 ) V s ) Jv,s 
+(-4 Cl + 14c 2 ) V AG [\ - J$) + 2 Cl Vcu ( j - J$) (j + jj? s ) 

(6 Cl + 6c 2 ) TVg + 12c 2 Vau] (| - (| - 4 3) 3 ) (43) 



+ 



3.3 Reading values for the amino-acids 

We have therefore constructed reading operators for the genetic codes specified above, starting 
from a prototype code that emphasizes the quartet /doublet structure of the different codes. 
The different reading operators are such that they give the same value for a given amino-acid, 
whatever the code under consideration. Finally, we get the following eigenvalues of the reading 
operators for the amino-acids (after a rescaling, setting c = Ci/c 2 ): 



a. a. 


value of 1Z 


a. a. 


value of 1Z 


a. a. 


value of 1Z 


Ala 


-c+ 3 


Gly 


-c + 5 


Pro 


-c- 1 


Arg 


-c+1 


His 


-3c + 1 


Ser 


3c- 1 


Asn 


9c + 5 


He 


5c + 9 


Thr 


3c + 3 


Asp 


5c + 5 


Leu 


c- 1 


Trp 


3c -5 


Cys 


3c + 7 


Lys 


17c + 5 


Tyr 


c+1 


Gin 


5c + 1 


Met 


5c -3 


Val 


c + 3 


Glu 


13c + 5 


Phe 


-7c- 1 


Ter 


9c + 1 



(44) 



Remark that the reading operators TZ(c) can be used for any real value of c, except those 
conferring the same eigenvalue to codons relative to two different amino-acids. These forbidden 
values are the following: —7, —5, —4, —3, — §, — |, —2, — |, — |, — |, —1, 



_3 _1 

5' 2' 
4 3 o 5 
3' 2' ' 2 



7' 5' 

, 3, 4, 5. 



i 

'3' 



_3_ 

10' 



2 
'7' 



1 
'4' 



2 

'9' 



1 

5' 



1 

6' 



1 
'7' 



5 
6' 
1 1 1 



4 _3 
5' 4' 

1 1 2 z I 

7' 6' 5' 4' 3' 5' 2' 3' 



5 _2 
7' 3' 
1 2 



At this point, let us emphasize the specific properties of our model. To each nucleotide are 
assigned specific quantum numbers characterizing its purine/pyrimidine origin and involving 
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the complementary rule. Then ordered sequences of bases can be constructed and character- 
ized in this framework. Ordered sequences of three bases have been just above examined and 
the correspondence codon/amino-acid represented by the reading operator TZ. Finally let us 
remark that the coefficients q, which above have been taken as constants, can more gener- 
ally be considered as functions of some external variables (biological, physical and chemical 
environment, time, etc.). In this way it is possible to explain the observed discrepancy in the 
correspondence codons/amino-acid in biological species under stress conditions (in vitro). In 
this scheme the evolution process of genetic code can also be discussed. However, we believe 
that a better understanding of the reasons of the evolution, i.e. which kind of optimization 
process takes place, has still to be acquired. 

4 Physical properties of the dinucleotides 

The model we have at hand, with nucleotides characterized by quantum numbers, is well 
adapted to elaborate formulae expressing biophysical properties. A particularly interesting 
quantity is the free energy released by base pairing in double stranded RNA. The data are not 
provided for a doublet of nucleotides, with one item in each strand, but for a pair of nucleotides, 
for ex. CG, lying on one strand and coupled with another pair, i.e. GC on the second strand 
; note also that the direction on a strand being perfectly defined, the release of energy for the 
doublet sequence CG on the first strand running from 5' to 3' related to the doublet GC on the 
complementary strand running from 3' to 5', will be different to the one related to the doublet 
GC, itself associated to CG. It appears clear that such quantities involve pairs of nucleotides, 
and that naturally ordered crystal bases obtained from tensor product of two representations 
are adapted for such a calculation. 

We will also consider two other quantities involving again pairs of nucleotides, namely the 
relative hydrophilicity Rf and hydrophobicity R x of dinucleosides. 

Before presenting our results, let us mention that fits for the same biophysical properties 
can be found in a recent preprint [5] where polynomials in 4 or 6 coordinates in the 64 codon 
space are constructed. In their approach, the authors associate two coordinates (d, m) to each 
nucleotide of any codon, as follows: A = (—1,0), C = (0, —1), G = (0, 1), U = (1,0), labelling 
in this way each codon with 6 numbers. The above labelling of the nucleotides is related to our 
labels Eq. (2) in the following way: 





d 


m 


c 


Jv,3 


~ Jh,3 


~{Jv,3 + ^,3) 


u 


Jv,3 


— Jh,3 


— {Jv,3 + ^,3) 


G 


Jv,3 


+ Jh,3 


Jv,3 ~ Jh,3 


A 


Jv,3 


+ J H ,3 


Jv,3 ~ Jh,3 



Therefore the labels (d,m) just correspond up to a sign for the pyrimidine (resp. purine) to 
the antidiagonal and diagonal (resp. diagonal and antidiagonal) U q ^o(sl(2)). 
In the following we compare our results with those of [5]. 
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Free energy 

In [1] we have fitted the experimental data with a four-parameter operator. Here we fit the 
more recent data [6] with a two-parameter operator obtained from the one used in [1] by setting 
two parameters to zero: 

AG° i7 = a + ai {C H + C v ) J* H (45) 
Using a least-squares fit, one finds for the coefficients a.{. 

a = -2.14, cti = -0.295 (46) 

The standard deviation of the two-parameter fit (46) is found to be equal to 0.149, which is to be 
compared to the standard deviation 0.16 of the four-parameter fit of ref. [5]. The experimental 
and fitted values of the free energies AG® 7 of the dinucleotides are displayed in Table 1. 



CA 


-2.1 " 
-2.14. 


CG 


-2.4 
-2.73 


UG 


-2.1 
-2.14 


UA 


-1.3 " 
-1.55 . 


cu 


-2.1 - 
-2.14 




" cc 


-3.3 
-3.32 


UC 


-2.4 
-2.14 


UU 


-0.9 -I 
-0.96 


GU 


-2.2 
-2.14 




GC 


-3.4 
-3.32 


AC 


-2.2 
-2.14 


AU 


-1.1 
-0.96 


GA 


-2.4 
-2.14 J 




GG 


-3.3 
-3.32 


AG 


-2.1 
-2.14 


AA 


-0.9 
-0.96 J 



Table 1: Dinucleotides free energies AG® 7 . 
The upper (resp. lower) values are the experimental (resp. fitted) values. 

Hydrophilicity 

We fit the values of the relative hydrophilicity Rf of the 16 dinucleoside monophosphates [7] 
with the following four-parameter operator: 

R f = a + a x C v + a 2 J d zv + a 3 ^ (J* H + 4v)(4h + 4v ~ x ) ( 47 ) 

i=l,2 

(the last term in a 3 is equal to 4 for AA, to 2 for CA, GA, UA and zero for the other dinu- 
cleotides). 

Using a least-squares fit, one finds for the coefficients af. 

a = 0.135, e*i = 0.036, a 2 = 0.147, a 3 = -0.016 (48) 

The standard deviation of the four-parameter fit (48) is found to be equal to 0.027, which 
is to be compared to the standard deviation 0.033 of the six-parameter fit of ref. [5]. The 
experimental and fitted values of the hydrophilicity Rf of the dinucleosides are displayed in 
Table 2. 
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CA 


0.083" 
0.103 


CG 


0.146 
0.135 


UG 


0.160 
0.135 


UA 


0.090 " 
0.103 . 


cu 


0.359 " 
0.354 




cc 


0.349 
0.354 


UC 


0.378 
0.354 


uu 


0.389 " 
0.354 


GU 


0.224 
0.207 




GC 


0.193 
0.207 


AC 


0.118 
0.175 


AU 


0.112 
0.175 


GA 


0.035 
0.028 . 




GG 


0.065 
0.060 


AG 


0.048 
0.028 


AA 


0.023 
-0.004 . 



Table 2: Dinucleosides relative hydrophilicities Rj. 
The upper (resp. lower) values are the experimental (resp. fitted) values. 

Hydrophobicity 

We fit the values of the relative hydrophobicity R x of the 16 dinucleoside monophosphates as 
reported in [8] with the following four-parameter operator: 

R x = a + ai Ji v + a 2 J$ H + a 3 [( + J 3 V) 2 + (J 3 h + Jlvf\ ( 49 ) 

(the last term in a 3 is equal to 2 for AA, AC, CA and CC, to 2 for AU, AG, UA, UC, GC, GA, 

CU and CG and zero for UU, UG, GU, GG). 

Using a least-squares fit, one finds for the coefficients af 

a = 0.294, ai = -0.240, a 2 = -0.105 , a 3 = 0.136 (50) 

Using a least-squares fit without the dinucleoside AA, one finds new coefficients a,, which lead 
to better values of R x for the remaining dinucleosides: 

a = 0.309 , «x = -0.203 , a 2 = -0.068 , a 3 = 0.099 (51) 

The standard deviation of the four-parameter fit (50) is equal to 0.049, which is the same of 



CA 


0.494" 
0.507 


CG 


0.326 
0.340 


UG 


0.291 
0.309 


UA 


0.441 " 
0.476 . 


CU 


0.218 " 
0.205 




CC 


0.244 
0.236 


UC 


0.218 
0.205 


UU 


0.194 " 
0.174 


GU 


0.291 
0.309 




GC 


0.326 
0.340 


AC 


0.494 
0.507 


AU 


0.441 
0.476 


GA 


0.660 
0.611 . 




GG 


0.436 
0.444 


AG 


0.660 
0.611 


AA 


1 

0.778 . 



Table 3: Dinucleosides relative hydrophobicities R x . 
The upper (resp. lower) values are the experimental (resp. fitted) values. 

the four-parameter fit of ref. [5]. Using the fit (51), the standard deviation becomes 0.074 
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(including the value for AA) or 0.024 (excluding the value for AA). For this last case, the 
standard deviation of ref. [5] is still equal to 0.031. The experimental and fitted values (second 
fit) of the relative hydrophobicity R x of the dinucleosides are displayed in Table 3. 

5 Universal behaviour of ratios of codon usage frequency 

In the following the labels X, J, Z, K represent any of the 4 bases C, U, G, A. Let XJZ be 
a codon in a given multiplet, say m«, encoding an a. a., say Ai. We define the probability of 
usage of the codon XJZ as the ratio between the frequency of usage riz of the codon XJZ 
in the biosynthesis of Ai and the total number n of synthesized A iy i.e. as the relative codon 
frequency, in the limit of very large n. 

It is natural to assume that the usage frequency of a codon in a multiplet is connected to 
its probability of usage P(XJZ — > AA. We define [2] the branching ratio B Z k as 

_ P(XJZ -» AA 
BzK ~ P(XJK - Ai) (52) 

where XJK is another codon belonging to the same multiplet m^. It is reasonable to argue 
that in the limit of very large number of codons, for a fixed biological species and amino-acid, 
the branching ratio depends essentially on the properties of the codon. In our model this means 
that in this limit Bzk is a function, depending on the type of the multiplet, on the quantum 
numbers of the codons XJZ and XJK, i.e. on the labels J a , J a ^, where a = H or V, and on 
an other set of quantum labels leaving out the degeneracy on J a ; in Table 4 different irreducible 
representations with the same values of J a are distinguished by an upper label. 

We have put in evidence a correlation in the codon usage frequency for the quartets and 
the quartet subpart of the sextets, i.e. the codons in a sextet differing only for the third codon, 
for the vertebrates in [2] and for biological species belonging to the vertebrates, invertebrates, 
plants and fungi in [9] , and we have shown that these correlations fit well in our model with the 
assumed dependence on Bzk- Here we remark that for thirteen biological species belonging to 
the vertebrate class, with a statistics of codons larger than 95,000 (see Table 5), the ratio of 

B AG B au P(XJA -> AA P(XJC -> AA 



B uc B GC P(XJG -> AA P(XJU -> A,) 



(53) 



for quartets and the quartet subpart of the sextets has a behaviour independent of the specific 
biological species. Moreover, for the same amino-acids for which we have remarked correlations, 
the values of the ratio Bag I Buc are almost the same (see Table 8). We show that these 
behaviour and correlations find a nice explanation in our model. In Tables 6 and 7, we report 
respectively the values of the branching ratios Bag and Buc as computed from the database 
[10] (release of February 2000) and in Table 8 the ratio of these quantities. The average 
values {Bag/ Buc) i the standard deviations a and the ratios a / {Bag/ Buc) are displayed in 
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the following table: 





Pro 


Ala 


Thr 


Ser 


Gly 


Val 


Leu 


Arg 


(Bag/Buc) 


2.50 


2.84 


3.30 


2.67 


2.21 


0.33 


0.26 


1.32 


a 


0.46 


0.53 


0.56 


0.35 


0.30 


0.04 


0.03 


0.14 


a/ (Bag/Buc) 


0.19 


0.19 


0.17 


0.13 


0.14 


0.13 


0.10 


0.11 



The above behaviour can be easily understood considering a dependence on Bzk not only on 
the irreducible representations to which the codons XJZ and XJK appearing in the numerator 
and the denominator belong, but also on the specific states denoting these codons, and refining 
the factorized form of [2] as 

G H {b.s- J H ,z{XJZ)) G v (b.s- Jv^XJZ)) 



Bzk = F ZK (IR(XJZ); IR(XJK)) 



G H {b.s.; J H>3 (XJK)) G v {b.s.- Jh^XJK)) 



(54) 



where we have denoted by b.s. the biological species, by IR(XJZ) and J a ^(XJZ) the irre- 
ducible representation to which the codon XJZ belong (see Table 4), and the value of the third 
component of the a-spin of the state XJZ. Note that we have still neglected the dependence 
on the type of the biosynthetized amino-acid. The ratio Bag/Buc using Eq. (54), is no more 
depending on the biological species but only on the value of the irreducible representations of the 
codons. Moreover, for Pro, Ala, Thr, Ser, (resp. Val and Leu), the irreducible representations 
appearing in the F functions are the same as can be seen from Table 9, so we expect the same 
value for the ratio, which is indeed the case (see above Table), the value of Bag/Buc f° r the 
first four amino-acids (resp. for the last two amino-acids) lying in the range 2.90 ± 15% (resp. 
0.30 ± 15%). These values should be compared with the value 1.32 for Arg and 2.21 for Gly. 

Let us end this section by the following remark. From the above table, one might be tempted 
to consider the value of the ratio Bag/Buc for Gly of the same order of magnitude as the ones 
for Pro, Ala, Thr, Ser. Then one distinguishes, following this ratio, three groups of codons 
quartets: the one associated to the five just mentioned amino-acids, another one relative to Val 
and Leu, and a last one with Arg. Now, let us look at the dinucleotide pairs constituting the 
first two nucleotides in a codon in the light of our results of sect. 2: the pairs CC, GC, AC, UC 
and GG relative to Pro, Ala, Thr, Ser, and Gly respectively belong to the representation (1,1) 
of U q ^o(sl(2) © sl(2)); the states GU and CU relative to Val and Leu respectively belong to 
the representation (0,1); finally CG relative to Arg also lies in a different representation (1,0). 



6 Mutations in the genetic code 

In this section, we present a mathematical framework to describe the single-base deletions in 
the genetic code. In [11] starting from the observation that the single-base deletions in DNA, 
which occur far more frequently that single base additions, take place in the opposite site to a 
purine R, (R = G, A) i.e. a pyrimidine Y (Y = C, U/T) is deleted, arguments have been 
presented to explain why the Stop codons have the structure they have, see Table 4. We refer 
to the paper for more details and for references to the biological literature on the subject and 
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we recall here just the main ideas and conclusions of [11]. The starting point is the observed 
fact that deletions occur more frequently in the following sequences: YR, TTR, YTG and 
TR. In ref. [11] all these sequences have been refined as YTRV, (V = C, A, G). Starting 
from the structure of this dangerous sequence and using the complementarity property, an 
analysis shows that four codons - TAA, TAG, TTA, CTA - are both potential deletion site 
codons and reverse-complementary potential site codons. As a mutation at the end of a protein 
chain just implies the addition of further peptides, the authors conclude that the assignment of 
codons TAA and TAG as Stop codons minimizes the possible deleterious effects of deletion. 
Indeed the codon usage frequency of the dangerous codon CTA, as it can be seen from fig. (5) 
of [2] and from fig. (2) of [9], is very low. An analysis of the codon usage frequency exhibits 
an analogous behaviour for the codon TTA. 

The mechanism by which the above specified sequences are preferred in the deletion process 
is unclear. In the following we will present a mathematical scheme in which these properties can 
be settled. Let us recall that the Wigner-Eckart theorem, has been extended to the quantum 
algebra U q (sl(n)), and recently in [12] to the case of U q ->o(sl(2)). 

In [12] (q — > 0)-tensor operators have been introduced, called crystal tensor operators, which 
transform as 

H T L) = mT rn J± ( T L) = T L±1 (55) 

Clearly, if \m\ > j then has to be considered vanishing. 
The (q — > (J)-Wigner-Eckart theorem can be written (j 1 > j) 

T L \ji m i) = (- l ) 2j til +3 ~ tt ll r lii) \h+3 - a,m 1 + m) 

a=0 

(,^mi,ji—a 3—m,j—a ^mi,ji—a^—m,j—a) (56) 

The (q — > (J)-Wigner-Eckart theorem has the peculiar feature that the selection rules do 
not depend only on the rank of the tensor operator and on the initial state, but in a crucial 
way from the specific component of the tensor in consideration. The tensor product of two 
irreducible representations in the crystal basis is not commutative (see sect. 2), therefore one 
has to specify which is the first representation. In the following, as in [12], the crystal tensor 
operator has to be considered as the first one. 

Let us also remark the following peculiar property of crystal basis which will be used in the 
following. We specify it only for the case we are interested in, but it is a completely general 
property. 

An ordered sequence, or chain, of n nucleotides is a state belonging to an irreducible repre- 
sentation of U q ^o((sl(2) ©s/(2)) appearing in the n-fold product of the fundamental irreducible 
representation (1/2, 1/2). Moreover the same property holds for any subsequence of m (m < n) 
nucleotides. We can mimick the deletion of a N nucleotide in a generic position of a coding 
sequence by a local annihilation operator of the N nucleotide. In order to take into account 
the observed fact that the deletion of the nucleotide depends on the nature of the neighboring 
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nucleotides, we require the annihilation operator to behave as a defined crystal tensor opera- 
tor under U q -+a{sl{2))v or U q ^o(sl(2))H or both. In our mathematical description we have to 
specify the action of the annihilation operator on a chain of nucleotides. If we assume that the 
annihilation of the N nucleotide behaves e.g. as a spinor crystal operator for the U q -> (sl(2)) v , 
we have to require that the deletion of the N nucleotide from the initial chain of K nucleotides, 
described by the state \ J i: M^Qj), leading to the final chain of K — 1 nucleotides, described 
by the state |J/,M/;f2/), is compatible with the (q — > (J)-Wigner-Eckart theorem prescription 
for the action of the definite crystal spinor operator between the initial state |Jj, Mf,Qi) and 
the final state |J/,M/;Q/), where we have denoted by f2 the set of all the labels necessary 
to identify completely the state. As we shall see, this is far from being trivial and will put 
constraints on the type of nucleotides surrounding the nucleotide N. We have to specify which 
chain has to be considered in order to study the action of the crystal tensor operator. It seems 
reasonable to take into account chains formed by K = 2 and 3 nucleotides starting from N in 
the sense of the reading of the codon sequence. So we are defining on the chain the action of a 
"matrioska" crystal tensor operator. We assume: 

Assumption : The biological mechanism responsible for the deletion of a pyrimidine C (resp. 

1 /2 

U ) in a sequence can be schematized by a local crystal tensor operator T_'y 2 for U q ^o(sl(2) v ) 
and r l Jy 2 (resp. Ty 2 ) for U q ^ (sl(2) H ), which transforms the state YX (resp. YXZ) into the 
state X (resp. XZ), X, Z being any nucleotide. 

By "local crystal tensor operator" we mean an operator which, in the sequence of RNA, acts 
on the if-chain (K = 2, 3) starting with Y, deleting the pyrimidine, according to the selection 
rules imposed by the assumed type of the crystal tensor. 

Let us point out that, differently to ref. [11], where the DNA sequence was analyzed, we 
consider the transcripted RNA sequence and the deletion in the trascription of a Y. 

There are 8 possible cases (we denote the initial and final states with the notation of sect. 
2 and by A (resp. F) the allowed (resp. forbidden) transition). We analyze the deletion of a C 
(on the left) and of an U (on the right). 



Action of r]_y 2 H © t*(/ 2 v 



(1,1) 


-> (1 1) 

V2' 2> 




cc 


c 


F-F 


(0,1) 


-> (- ±) 

V2' 2> 




cu 


u 


A-F 


(1,0) 


-> a i) 

V2' 2> 




CG 


G 


F-A 


(0,0) 


-> (1 I) 

V2' 2> 




CA 


A 


A-A 



Action of Ty 2H 



T 



1/2 

-1/2.V 



(1,1) 


-> a i) 

V2' 2> 




uc 


c 


A-F 


uu 


u 


A-F 


(1,0) 


-> (- ±) 

V2' 2> 




UG 


G 


A-A 


UA 


A 


A-A 



So for the transition for the state of dinucleotide to one nucleotide state, from the assumed 
nature of the crystal tensor operator, it follows that a pyrimidine can be deleted if followed by 
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a purine. Now let us consider what happens if we consider the transition from a trinucleotide 
to a dinucletide state. Using the previous result we consider only the state in which a purine 
is in second position so we have to consider 16 cases: 



Action of r 



1/2 

-1/2.H 



.1/2 

-1/2.V 



Action of rjyj # © r -i/2 v 



(|,§) 4 -(M) 




(3 1)2 

V2' 2> 


-(1,1) 




CAC 
CAU 
CAA 
CAG 


AC 
AU 
AA 
AG 


A-A 
A-A 
A-A 
A-A 




UAC 
UAU 
UAA 
UAG 


AC 
AU 
AA 
AG 


A-A 
A-A 
A-A 
A-A 


(3 1)2 
V2' 2> 


-(1,1) 






(- ^ 2 

V2' 2> 


-(1,1) 




CGC 
CGG 


GC 
GG 


F-A 
F-A 




UGC 
UGG 


GC 
GG 


A-A 
A-A 


(I 1)2 

V2' 2/ 


-(0,1) 






(I 1)2 

V2' 2/ 


-(0,1) 




CGU 
CGA 


GU 
GA 


F-A 
F-A 




UGU 
UGA 


GU 
GA 


A-A 
A-A 



So, from the assumed nature of the crystal tensor operator, the transition from a trinucleotide 
to a dinucleotide state is horizontally forbidden for the deletion of a C if the second nucleotide 
is a G. 

Let us note that we have made the simplified assuption that the transitions depend only on 
the values of J a , J a ^ of the initial and final state. 

Moreover, both to take into account the data of [11] and to check that the results are not 
very sensible to the choice of the initial state, we consider the deletion of a purine in second 
position in a four-nucleotide state and impose that the process may take place only if the initial 
and final state can be connected by a spinor crystal operator r]/y 2 H © v for the deletion 
of C or Tjyg u ® T -i/2 v ^ or deletion of U. 

As the two pyrimidines differ by their value of Jh,3, the constraints imposed by the tensor 
operator t~±^ 2H are weaker than those imposed by the tensor operator T^y 2 v 

In Appendix (in sect. 2) we have reported all the irreducible representations arising by the 
4-fold (3-fold) tensor product of the fundamental representation. A detailed analysis shows 
that only the following deletions may happen (we report all the transitions that are allowed at 
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least once): 



Action of r l Jy 2 H © t 1 J^ 



/2,V 



(2,l) 3 - 


/ 3 3 \ 
*■ V2' 2> 




GCGC 
ACGC 
GCGG 
ACGG 


GGC 
AGC 
GGG 
AGG 


F-A 
F-A 
F-A 
F-A 


(2,0) 2 -» 


V2' 2> 




CCGG 
UCGG 


CGG 
UGG 


F-A 
F-A 




/l 3 M 

(if) 




GCGU 
ACGU 
GCGA 
ACGA 


GGU 
AGU 
GGA 
AGA 


F-A 
F-A 
F-A 
F-A 


(1,0) 4 -> 


/ 1 1 \2 
V2> 2> 




CCGA 
UCGA 


CGA 
UGA 


F-A 
F-A 


(M) 9 - 


(I 3)2 

V2' 2> 




GCAC 
GCAG 


GAC 
GAG 


F-A 
F-A 


(i,o) 6 - 


fl 1)4 

V2' 2> 




CCAG 


CAG 


F-A 


(1,1) 9 - 


/ 3 3 \ 
*■ V2' 2> 




ACAC 
ACAU 
ACAG 
ACAA 


AAC 
AAU 
AAG 
AAA 


A-A 
A-A 
A-A 
A-A 


(i,o) 6 - 


(- A ) 2 

V2' 2/ 




UCAG 
UCAA 


UAG 
UAA 


A-A 
A-A 



Action of r]!y 2 H ® r l J? /2y 



(1,2) 3 - 


/ 3 3 \ 
*■ V2' 2> 






ucuc 
ucuu 

ACUC 
ACUU 


uuc 
uuu 

AUC 
AUU 


A- 
A- 
A- 
A- 


-F 
-F 
-F 
-F 


(0,2) 2 ^ 


(- -Y 

V2' 2/ 






CCUU 
GCUU 


cuu 

GUU 


A- 
A- 


-F 
-F 


(1,1) 8 -> 


/ 3 1 \ 1 
V2' 2> 






UCUG 
UCUA 
ACUG 
ACUA 


UUG 
UUA 
AUG 
AUA 


A- 
A- 
A- 
A- 


-F 
-F 
-F 
-F 


(0,1) 5 -» 


(l 1 \3 

V2' 2/ 






CCUA 
GCUA 


CUA 
GUA 


A- 
A- 


-F 
-F 




V2' 2^ 






UCAC 
UCAU 


UAC 
UAU 


A- 
A- 


-F 
-F 


(o,i) 6 - 


fl 1)4 

V2' 2> 






CCAU 


CAU 


A- 


-F 


(0,0) 4 ^ 


(I 1)4 

V2' 2> 






CCAA 


CAA 


A- 


A 


(0,1) 6 - 


fl 3)2 
V2' 2> 






GCAU 
GCAA 


GAU 
GAA 


A- 
A- 


A 
A 



So we remark: 

• The deletion of C, allowed or horizontally forbidden, may happen only if it is followed 
by a purine. In the allowed cases, it must be followed by the nucleotide A, in agreement 
with the observed data. 



A nucleotide A before the deleted nucleotide C appears only in the transition (1, l) 9 — > 
(|, |). This feature is present in the observed data with a very low occurrence, which 
in our language would mean that the matrix element of r between these two irreducible 
representations is small. 
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Now we consider the case of deletion of U. 
deletions may happen: 



Action of Ty^H £ 


^ ^~ 1/0 1/ 




(I 3)2 

V2' 2/ 




CUUC 

cuuu 

GUUC 
GUUU 


cue 
cuu 

GUC 
GUU 


A-F 
A-F 
A-F 
A-F 


1\2 
I 1 ) 2 ) - 


(3 3\ 
V2' 2/ 




CUCC 
GUCC 


ccc 

GCC 


A-F 
A-F 


I 1 ) l ) ~* 


/l 1\4 

V2' 2/ 




CUAC 
CUAU 


CAC 
CAU 


A-F 
A-F 


(M) 6 - 


(I 1)3 

V2' 2^ 




UUCA 

ATT A 

AUCA 


UCA 
ACA 


A-F 
A-F 


(2 I) 2 -> 


(3 1)1 
V2' 2> 




UUCG 
UUUG 
UUUA 
AUCG 
AUUG 
AUUA 


UCG 
UUG 
UUA 
ACG 
AUG 
AUA 


A-F 
A-F 
A-F 
A-F 
A-F 
A-F 


In 1 \3 


(3 1\2 
V2' 2^ 




UUGC 
UUAC 
UUAU 


UGC 
UAC 
UAU 


A-F 
A-F 
A-F 




V2' 2> 




UUGU 


UGU 


A-F 


(o,i) 4 - 


V2' 2> 




CUGU 


CGU 


A-F 



detailed analysis shows that only the following 



Action of © T -i/2,v 





(hA) 3 

V2' 2/ 






CUUG 


CUG 


A- 


-F 


GUUG 


GUG 


A- 


-F 


CUUA 


CUA 


A- 


-F 


GUUA 


GUA 


A- 


-F 


(-\ 1 \2 
{1,1) -»■ 


(3 1\1 

V2' 2/ 






CUCG 


CCG 


A- 


-F 


GUCG 


GCG 


A- 


-F 




/l 3\1 
V2' 2^ 






uucu 


ucu 


A- 


-F 


AUCU 


ACU 


A- 


-F 


en 2Y — > 


(I 3)1 

V2' 2/ 






cucu 


ecu 


A- 


-F 


GUCU 


GCU 


A 

A- 


-F 


(2,2) -> 


(3 3) 
V2' 2/ 






uuee 


ucc 


A- 


-F 


uuuc 


uuc 


A- 


-F 


uuuu 


uuu 


A- 


-F 


AUCC 


ACC 


A- 


-F 


AUUC 


AUC 


A- 


-F 


AUUU 


AUU 


A- 


-F 


(o,i) 3 - 


(I I)i 

V2' 2/ 






CUCA 


CCA 


A- 


-F 


GUCA 


GCA 


A- 


-F 


(M) 3 - 


V2' 2^ 






CUGC 


CGC 


A- 


-F 
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Action of Ty% H © t]/i/ 2 y 



Action of Ty^jj © T-i/2 V 



1 1-5 1J ^ 


rl 3\i 

U> 2^ 




AUGU 
AUGA 


AGU 
AGA 


A-A 
A-A 


(o,i) 4 - 


(I 3U 
V2' 2/ 




GUGU 
GUGA 


GGU 
GGA 


A-A 
A-A 




/l 3\2 
V2' 2^ 




GUAC 
GUAU 
GUAG 
GUAA 


GAC 
GAU 
GAG 
GAA 


A-A 
A-A 
A-A 
A-A 


(o n) 2 _> 


(1 1)2 

V2' 2/ 




UUGG 
UUAG 
UUAA 


UGG 
UAG 
UAA 


A-A 
A-A 
A-A 


(i,o) 2 - 


(3 1)2 
V2' 2^ 




CUGG 


CGG 


A-A 









GUGC 
GUGG 


GGC 
GGG 


A-A 
A-A 


(i,o) 2 - 


(I 1)4 

V2' 2> 




CUAG 
CUAA 


CAG 
CAA 


A-A 
A-A 


(2, 1) 3 - 


► (3 3) 
V 2 ' 2 / 




AUGC 
AUAC 
AUAU 
AUGG 
AUAG 
AUAA 


AGC 
AAC 
AAU 
AGG 
AAG 
AAA 


A-A 
A-A 
A-A 
A-A 
A-A 
A-A 


(0,0) 2 ^ 


(I 1)2 

V2' 2/ 




CUGA 


CGA 


A-A 



So we remark: 

• The deletion of U may happen only if it is followed by A or by G. In the observed 
data only A is considered; however in [11] the reported deletion of U are about 1/4 
with respect to the reported deletion of C. So our modelisation just foresees a different 
environment for the deletion of U and C. 

• The last nucleotide in the four-nucleotide sequence in which the deletion occurs may be 
any nucleotide, but the case in which it is a purine seems more frequent than the case in 
which it is a pyrimidine. 

• There are no transition which are only horizontally forbidden. 

In conclusion, both from considering the transitions on the i^-chains (K = 2, 3) to the 
(K — l)-chains or the transition from the four-nucleotide states to the three- nucleotide states 
under the action of the crystal tensor operators, we deduce that the deletion of a pyrimidine 
may happen if it is followed by a purine. In particular, for the deletion of C the preferred purine 
is the adenine A, whilst for the deletion of U also the guanine G may appear. This makes a 
difference between the two cases and it would be extremely interesting to see if more accurate 
data may confirm this asymmetry. Moreover the next following nucleotide may be of any type 
but there is indication that a purine is preferred. So our mathematical scheme explains the 
main features of the observed data [11]. A more quantitative analysis should require higher 
statistics in the experimental data. 
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7 Recent theoretical approaches: a comparison 



The use of continuous symmetries in the genetic code has been considered by different teams 
these recent years 2 . It appears of some importance to summarize each of these approaches, and 
to make clear how the model we propose differ from them. 

In 1993, an underlying symmetry based on a continuous group has been proposed [13]. More 
precisely, considering the eukaryotic code, the authors tried to answer the following question: 
is it possible to determine a Lie algebra Q carrying a 64-dimensional irreducible representation 
R and admitting a subalgebra 7i such that the decomposition of R into irreducible multiplets 
under 7i gives exactly the 21 different multiplets, the different codons in each of the first 
20 multiplets being associated to the same amino-acid, the last multiplet containing the stop 
codons ? They proposed as starting symmetry the symplectic algebra sp(6), which indeed 
admits an irreducible representation of dimension 64, equal to the number of different codons, 
with the successive breakings: 

sp(Q) D sp(4)®su(2) D su(2)®su(2)®su(2) D su(2)®U(l)®su(2) D su(2)®U(l)®U(l) (57) 

Such a chain of symmetry breaking could be considered as reflecting the evolution of the genetic 
code, the six amino-acids relative to the codons in the irreducible representations obtained after 
the first breaking (in which 64 = 16 + 4 + 20 + 10 + 12 + 2) appearing as primordial amino- 
acids in their approach. However, the authors were obliged, in order to reproduce the actual 
multiplet pattern, to assume in the final breaking, a partial breaking or a "freezing" in the 
sense that the breaking of the last su{2) into U(l) does not occur for all the multiplets. As 
an example, such a freezing has to be imposed to the sextets corresponding to Leu and Ser, 
which otherwise would decompose into three doublets. In the same way, freezing will forbid 
the doublets related to Lys and Cys to split into singlets. 

In a second further paper, dated 1997 [14], a refinement of this approach has been considered, 
with the use of Lie groups instead of Lie algebras: then, global properties, for example non 
connexity of 0(2) = U{\) x Z 2 , can be exploited. In this context, the authors proposed another 
chain of breaking starting with the exceptional group G2, which also allows a 64 dimensional 
irreducible representation. But here again, the freezing pathology cannot be avoided. 

One can also mention the work of [15] where the unifying algebra before breaking is so(14). 

Meantime (1997), interpreting the double origin of the nucleotides, each arising either from 

purine or from pyrimidine, as a Z 2 -grading a supersymmetric model was proposed [16], involving 

superalgebras for such a program. The Z 2 -grading specific of a simple superalgebra is there 

used to separate purine and pyrimidine: indeed, by putting the four nucleotids in the the 4 

dimensional representation of su(2/l) one can confer to the A and G purines (R) an even 

grading, and to the C and U pyrimidines (Y) an odd grading; note that the R states are then in 

the su{2) doublet and the Y ones su{2) singlets. The notion of polarity spin is also introduced, 

2 See section "Symmetry techniques in Biological Systems" in Proc. XXII Int. Coll. on Group Theoretical 
Methods in Physics, pp. 142-165. 
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allowing to distinguish the C and G nucleotides with two locally polarized sites, from the A 
and U ones with three polarized sites: the C and G (resp. A and U) will be assigned in a 
doublet (resp. in singlets) of another su(2). Then the authors consider the sum of algebras: 
su(2) ©sw(2) ©sw(2|l) with the first (second) su(2) acting as polarity spin on the first (second) 
nucleotid of a codon, and the su(2|l) acting on the third nucleotid only. Moreover the two 
su(2) would act in an alternating way on the first and second position, that is as 1/2, —1/2 and 
—1/2, 1/2. This sum of algebras can be embedded in the superalgebra su(6|l), which admits a 
64 dimensional irreducible representation, and could be also used for a superalgebraic approach 
to the genetic code evolution, with the chain of symmetry breaking: 

s«(6|l) D su(2) © su(3\l) D su(2) © su(2) © s«(2|l) D U(l) © U(l) © s«(2|l) 

D C/(1)©C/(1)©(?Z(1|1) (58) 

Again the problem of freezing, that is the last breaking applies to some but not all the multiplets, 
is present with this choice of (super) algebras. 

It seems necessary to remark that in this proposal which implies (super) algebras acting in 
the same time on nucleotides and on codons - one must say in a rather complicated way - 
the nucleotides cannot appear as building blocks from which one algebraically constructs the 
codons, by performing tensorial products of representations, as is the case of our model. In fact, 
the problem of ordering the nucleotides inside a codon forbids this natural way of proceeding as 
long as only usual (super) algebras are involved. Note that it is the limit of quantum algebras 
that we use in our approach: then, we have at hand the so-called crystal bases, which exactly 
solve the ordering problem. 

In a last month preprint, two authors of the same team [5] proposed to fit biophysical 
properties of nucleic acids by constructing polynomials in 6 coordinates in the 64 dimensional 
codon space. As already mentioned in sect. 4, the two coordinates they associate to each 
nucleotide is direcetly related to the nucleotide eigenvalues of our model. The authors present 
their computations as independent of a particular choice of algebra or superalgebra as long 
as the underlying algebra is of rank 6 - which is in particular the dimension of the Cartan 
subalgebra of su(6|l) - and admits a 64 dimensional irreducible representation. We note that 
our model does allow to calculate the biophysical quantities considered in ref. [5] without the 
constraint on representations, but more importantly, with only a two rank algebra. 

A detailed and systematic study of superalgebras and superalgebra breaking chains has been 
performed by the authors of [17]: it is the orthosymplectic osp(5\2) superalgebra which emerges 
from their algebraic analysis. 

Finally, it is amazing to remark that, just a few years after the the concept of genetic code 
was formulated, an attempt to give a mathematical description of its properties was started 
by the russian physicist Yu. B. Rumer [18]. Indeed he remarked that the 16 roots, i.e. the 
combinations of the first two codons, divide in a strong octet which form quartets ou sub-part 
of sextets and a weak octet which form doublets, triplets and singlets, attempting to give a 
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systematic description of the genetic code. A few years after, with B.G. Konopel'chenko [19] 
they formulated the strong assumption that with respect to any property of the codons the 16 
roots can be gathered into two octets with opposite "charge" , whose positive (negative) value 
respectively characterizes the strong and weak roots. This description comes out naturally in 
our model, such a charge Q being defined in Eq. (5) of sect. 2. 

8 Conclusion 

Our model is based on the algebra U q ^o(sl(2) © sl(2)) that we have chosen for two main 
characteristics. First it encodes the stereochemical property of a base, and also reflects the 
complementarity rule, by conferring quantum numbers to each nucleotide. Secondly, it admits 
representation spaces or crystal bases in which an ordered sequence of nucleotides or codon 
can be suitably characterized. Let us emphasize that U q ^o(sl(2) © sl(2)) is really neither a Lie 
algebra nor an enveloping deformed algebra. We still use in a loose sense the word algebra, just 
to emphasize the fact that we use largely the mathematical tools of representation space, tensor 
operators etc. which are typical of the algebraic structures. Let us add that it is a remarkable 
property of a quantum algebra in the limit q — > to admit representations, obtained from the 
tensorial product of basic ones, in which each state appears as a unique sequence of ordered 
basic elements. 

In this framework, the correspondence codon/amino-acid is realized by the operator 1Z C , 
constructed out of the symmetry algebra, and acting on codons: the eigenvalues provided by 
1Z C on two codons will be equal or different depending on whether the two codons are associated 
to the same or to two different amino-acids. It is remarkable that this correspondence can be 
obtained for all the genetic codes and that the reading operators have a bulk common to the 
various genetic codes (the prototype reading operator) and differ only for a few additive terms, 
analogous to perturbative terms present in most Hamiltonians describing complex physical 
systems. Moreover they depend on parameters, presently assumed as constants, which in 
principle can be considered as functions of suitable variables. These feature may be of some 
interest in the study of the evolution of the genetic code, problem which has not yet been 
tackled in our model. 

Then, restricting to the case of states made of two nucleotides, the experimental values 
of the free energy, released by base pairing in the formation of double stranded nucleic acids, 
of the hydrophibicity and of the hydrophilicity have been fitted with expressions depending 
respectively on 2, 4 and 4 parameters and constructed out of the generators of U q ^o(sl(2) © 
sl{2)). 

The model does not necessarily assign the codons in a multiplet (in particular the quartets, 
sextets and triplet) to the same irreducible representation. Let us remark that the assignments 
of the codons to the different irreducible representations is a straightforward consequence of 
the tensor product, once assigned the nucleotides to the fundamental irreducible representation. 
This feature is relevant, since it can explain the correlation between the branching ratios of the 
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codon usage of different codons coding the same amino- acid as discussed in [2] and [9] . Here we 
have shown that the universal pattern (inside the class of vertebrates) of Bag/Buc can simply 
be reproduced in our model. 

Moreover our mathematical description of the genetic code allows a modelisation of some 
biological process. A first step in this direction has been presented in sect. 6, where we have 
shown that the observed data related to the a pyrimidine deletion can be simulated by introduc- 
ing the concept of q — > - or crystal - tensor operator. Finally let us mention some directions 
for future development of our model. Going further in the analysis of the branching ratios, we 
want to refine our analysis and make a more detailed study taking into account the dependence 
on the family of biological species. Indeed preliminary analysis on plants, invertebrates and 
bacteriae shows that, even if the pattern of the correlation is still approximatively present, 
large deviations appear which presumably exhibit evidence that the dependence on subclass or 
family of biological species cannot any more be neglected, differently to the case of vertebrates. 
A further investigation of the possibility of mathematically modelising or simulating biological 
processes, in particular mutations, by crystal tensor operators, is in progress. Other questions 
are still to be investigated: in particular how could the genetic code evolution be reproduced 
in our model ? 
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Figure 1: Classification of the codons in the different crystal bases. 
CCC UCC UUC Phe UUU 




ecu 



C(!G 



CCA 



Ala 



UCG 



UCA 




Thr 



UUG Leu 2 UUA 




(1/2,3/2)* 



(3/2, 1/2) 1 



(1/2, 1/2) 1 
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Figure 1 (continued) 



cue cuu 





C AC His CA U 



(1/2, 3/2) 2 



(l/2,l/2f 



(3/2, 1/2) 2 



(1/2, 1/2) 2 



(1/2, 1/2) 4 
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Table 4: The eukariotic code. The upper label denotes different irreducible representations. 



codon a. a. 


Jh Jv 


codon a. a. 


Jh Jv 


CCC Pro 
CCU Pro 
CCC Pro 
CCA Pro 


3/2 3/2 
(1/2 3/2) 1 
(3/2 1/2) 1 
(1/2 1/2) 1 


UCC Ser 
UCU Ser 
UCG Ser 
UCA Ser 


3/2 3/2 
(1/2 3/2) 1 
(3/2 1/2) 1 
(1/2 1/2) 1 


CUC Leu 
CUU Leu 
CUC Leu 
CUA Leu 


(1/2 3/2)* 
(1/2 3/2) 2 
(1/2 1/2) 3 
(1/2 1/2) 3 


UUC Phc 
UUU Phc 
UUG Leu 
UUA Leu 


3/2 3/2 
3/2 3/2 
(3/2 1/2) 1 
(3/2 1/2) 1 


CGC Arg 
CGU Arg 
CGG Arg 
CGA Arg 


(3/2 1/2)* 
(1/2 1/2) 2 
(3/2 1/2) 2 
(1/2 1/2) 2 


UGC Cys 
UGU Cys 
UGG Trp 
UGA Ter 


(3/2 1/2) 2 
(1/2 1/2) 2 
(3/2 1/2) 2 
(1/2 1/2) 2 


CAC His 
CAU His 
CAG Gin 
CAA Gin 


(1/2 1/2) 4 
(1/2 1/2) 4 
(1/2 1/2) 4 
(1/2 1/2) 4 


UAC Tyr 
UAU Tyr 
UAG Ter 
UAA Ter 


(3/2 1/2) 2 
(3/2 1/2) 2 
(3/2 1/2) 2 
(3/2 1/2) 2 


GCC Ala 
GCU Ala 
GCG Ala 
GCA Ala 


3/2 3/2 
(1/2 3/2) 1 
(3/2 1/2) 1 
(1/2 1/2) 1 


ACC Thr 
ACU Thr 
ACG Thr 
ACA Thr 


3/2 3/2 
(1/2 3/2) 1 
(3/2 1/2) 1 
(1/2 1/2) 1 


GUC Val 
GUU Val 
GUG Val 
GUA Val 


(1/2 3/2) 2 
(1/2 3/2) 2 
(1/2 1/2) 3 
(1/2 1/2) 3 


AUC He 
AUU He 
AUG Met 
AUA He 


3/2 3/2 
3/2 3/2 
(3/2 1/2) 1 
(3/2 1/2) 1 


GGC Gly 
GGU Gly 
GGG Gly 
GGA Gly 


3/2 3/2 
(1/2 3/2) 1 
3/2 3/2 
(1/2 3/2) 1 


AGC Ser 
AGU Ser 
AGG Arg 
AGA Arg 


3/2 3/2 
(1/2 3/2) 1 
3/2 3/2 
(1/2 3/2) 1 


GAC Asp 
GAU Asp 
GAG Glu 
GAA Glu 


(1/2 3/2) 2 
(1/2 3/2) 2 
(1/2 3/2) 2 
(1/2 3/2) 2 


AAC Asn 
AAU Asn 
AAG Lys 
AAA Lys 


3/2 3/2 
3/2 3/2 
3/2 3/2 
3/2 3/2 
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Table 5: Biological species sample used in analysis of sect. 5 







IN LIlIlUcl 

of sequences 


\ n ill hpi 1 
IN LllilUcl 

of codons 


1 
1 


Homo sapiens 


1 ( ozo 


87n 7fin f ? 
o ( U ( DUo 


Z 


Rattus norvegicus 


4»U ( 


z4o» lou 


q 


Gallus gallus 




7fiQnns 

( OoUUo 


4 


Xenopus laevis 


1 /I QQ 
14oo 


040Z14 


cr 



Bos taurus 




0140UZ 


P. 



Oryctolagus cuniculus 


71 Q 


Q c;s/i /I 7 


7 


Sus scrofa 


ODO 


Z / DU4D 


« 


TjflTllO TPTIO 


^oo 


21 32^8 


9 


Rattus rattus 


342 


153049 


10 


Canis familiaris 


317 


142944 


11 


Rattus sp. 


299 


112039 


12 


Ovis aries 


327 


101591 


13 


Fugu rubripes 


157 


95979 



Table 6: Bag ratios for the quartets 





Pro 


Ala 


Thr 


Ser 


Val 


Leu 


Arg 


Gly 


1 


2.34 


2.03 


2.29 


2.51 


0.23 


0.17 


0.53 


0.99 


2 


2.40 


2.17 


2.33 


2.35 


0.22 


0.17 


0.61 


1.03 


3 


1.77 


1.90 


1.96 


1.93 


0.25 


0.14 


0.52 


1.02 


4 


4.10 


4.23 


4.08 


3.45 


0.48 


0.32 


1.00 


1.67 


5 


2.02 


1.80 


1.94 


2.32 


0.21 


0.14 


0.56 


1.01 


6 


1.45 


1.45 


1.30 


1.45 


0.15 


0.10 


0.44 


0.88 


7 


1.60 


1.60 


1.52 


1.69 


0.16 


0.12 


0.46 


0.89 


8 


1.39 


1.47 


1.71 


1.68 


0.22 


0.18 


0.89 


1.94 


9 


2.28 


1.97 


2.19 


2.26 


0.21 


0.17 


0.66 


1.03 


10 


2.09 


1.72 


1.81 


1.90 


0.21 


0.15 


0.49 


1.01 


11 


2.22 


2.15 


2.27 


2.24 


0.21 


0.16 


0.62 


1.07 


12 


2.15 


1.60 


1.76 


1.99 


0.15 


0.13 


0.60 


1.08 


13 


1.60 


1.40 


1.28 


1.42 


0.17 


0.12 


0.73 


1.23 
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Table 7: B uc ratios for the quartets 





Pro 


Ala 


Thr 


Ser 


Val 


Leu 


Arg 


Gly 


1 


0.85 


0.64 


0.64 


0.82 


0.72 


0.64 


0.43 


0.47 


2 


0.91 


0.69 


0.61 


0.78 


0.59 


0.57 


0.48 


0.49 


3 


0.75 


0.80 


0.69 


0.77 


0.84 


0.64 


0.45 


0.51 


4 


1.27 


1.15 


1.05 


1.17 


1.26 


1.24 


0.98 


0.87 


5 


0.78 


0.61 


0.57 


0.79 


0.65 


0.57 


0.41 


0.47 


6 


0.62 


0.47 


0.46 


0.54 


0.51 


0.43 


0.29 


0.34 


7 


0.68 


0.54 


0.49 


0.65 


0.50 


0.47 


0.33 


0.38 


8 


1.02 


0.88 


0.69 


0.83 


0.82 


0.64 


0.60 


0.68 


9 


0.88 


0.71 


0.59 


0.76 


0.59 


0.57 


0.50 


0.51 


10 


0.76 


0.61 


0.57 


0.76 


0.56 


0.55 


0.37 


0.53 


11 


0.94 


0.69 


0.58 


0.83 


0.55 


0.55 


0.47 


0.49 


12 


0.70 


0.53 


0.45 


0.73 


0.50 


0.46 


0.41 


0.43 


13 


0.77 


0.68 


0.55 


0.71 


0.60 


0.49 


0.57 


0.64 



Table 8: B AG /B UC ratios for the quartets 





Pro 


Ala 


Thr 


Ser 


Val 


Leu 


Arg 


Gly 


1 


2.75 


3.15 


3.57 


3.05 


0.32 


0.26 


1.25 


2.11 


2 


2.63 


3.15 


3.81 


3.02 


0.38 


0.30 


1.28 


2.10 


3 


2.38 


2.38 


2.83 


2.50 


0.30 


0.21 


1.14 


2.00 


4 


3.22 


3.69 


3.89 


2.96 


0.38 


0.25 


1.02 


1.92 


5 


2.60 


2.96 


3.40 


2.92 


0.32 


0.25 


1.36 


2.17 


6 


2.33 


3.08 


2.80 


2.67 


0.29 


0.24 


1.55 


2.60 


7 


2.34 


2.97 


3.11 


2.60 


0.32 


0.25 


1.38 


2.36 


8 


1.36 


1.68 


2.48 


2.03 


0.27 


0.27 


1.48 


2.87 


9 


2.58 


2.78 


3.68 


2.98 


0.36 


0.31 


1.32 


2.00 


10 


2.74 


2.82 


3.17 


2.51 


0.38 


0.28 


1.32 


1.91 


11 


2.36 


3.14 


3.93 


2.71 


0.38 


0.29 


1.34 


2.18 


12 


3.08 


3.03 


3.92 


2.72 


0.30 


0.28 


1.45 


2.52 


13 


2.09 


2.06 


2.31 


2.01 


0.27 


0.24 


1.28 


1.92 



Table 9: F functions appearing in the Bag/ Buc ratios 





Pro 


Ala 


Thr 


Ser 


F AG ( 


(I IU-(3 ly) 

V2' 1> ' V2' 1> J 


Fag({\, I) 1 ; (§, \) 1 ) 


Fag({\, I) 1 ; (§, \) 1 ) 


Fag({\, I) 1 ; (f, \) 1 ) 


Fjjc{ 


(l' l) ' (2' 2~)) 


F uc({\, I) 1 ; (§, |)) 


F uc({\, I) 1 ; (§, |)) 


Fuc({\, I) 1 ; (|, |)) 




Val 


Leu 


Arg 


Gly 


F AG ( 


a 1)3. (l i)3) 

V 2 ' 2/ ' V2 ' 1> j 


Fag({\, |) 3 ; (§, i) 3 ) 


Fag{(\, (|, \) 2 ) 


Fag^, f) 1 ; (|, |)j 


Fuc[ 


(1 3)2.(1 3)2^ 
V2' 2/ ' V2' 2/ J 


Fug ((if) 2 ; (hi) 2 ) 


Fug{{UY-AU) 2 ) 


F uc({\, |) 2 ; (|, |)) 
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Table 10: Amino-acid content of the ® 3 (|, \) representations 



'3 3^ 
-2» 2> 



(- -) 1 

V2' 2> 



(3 1\2 = 
\2i 2> — 



/ P-Pro 
A - Ala 
G — Gly 
V G " Gly 



P - Pro 
A - Ala 



R- Are 



(I 



1 3\1 
2' 2^ 



'1 3\2 
^2' 2> 



s - 


- Ser 


F - 


- Phe 


T - 


- Thr 


I - 


- He 


S - 


- Ser 


N - 


— Asn 


R- 


- Arg 


K - 


- Lys 


S 


- Ser 


L 


— Leu 


T 


- Thr 


M 


- Met 


C 


— Cvs 


Y 


— Tvr 
j 


W 


— TrD 




Ter 


( P 


- Pro 


S 


- Ser \ 


A 


- Ala 


T 


- Thr 


G 


-Gly 


S 


- Ser 




-Gly 


R 


-Arg / 


I L 


— Leu 


L 


— Leu \ 


V 


- Val 


V 


- Val 


D 


— Asp 


D 


— Asp 


\ E 


- Glu 


E 


-Glu / 



F-Phe \ 
/ - He 
N - Asn 
K - Lys ) 

L — Leu 
I - He 



Ter 



(- -) 1 

V2' 2> 



P - Pro 
A - Ala 



S - Ser 
T-Thr 



V2' 2> 



R-Arg 
R-krg 



C-Cys 
Ter 



'1 1^3 

v2' 2> 



L — Leu 
V - Val 



L — Leu 
V - Val 



•1 1\4 

-2' 2/ 



H-Kis H-Kis 
Q - Gin Q - Gin 
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Table 11: Four-fold tensor product of the (|, |) representation of U q ^o(sl(2) © si (2)) 



-2» 2' 



) ® (i i) ® (J i) ® (j, |) = ( 



v2 , 2 -) © 2 (|, \) © 2 (|, |) © 4 (2, \) 
= (2, 2) © 3 (2, 1) © 3 (1, 2) © 9 (1, 1) © 2 (2, 0) 
©2(0,2) © 6(1,0) © 6(0,1) © 4(0,0) 



One has (The upper label denotes different irreducible representations): 

(i l) ® (§, |) = (2, 2) © (2, l) 1 © (1, 2) 1 © (1, l) 1 

where 







(2,2) = 








W = 




/ cccc 


uccc 


uucc 


uuuc 


UUUU \ 




( CUCC 


cuuc 


cuuu \ 


GCCC 


ACCC 


AUCC 


AUUC 


AUUU 




GUCC 


GUUC 


GUUU 


GGCC 


AGCC 


AACC 


AAUC 


AAUU 




GACC 


GAUC 


GAUU 


GGGC 


AGGC 


AAGC 


AAAC 


AAAU 




GAGC 


GAAC 


GAAU 


\ GGGG 


AGGG 


AAGG 


AAAG 


AAAA J 




\ GAGG 


GAAG 


GAAA j 






(2,1) 1 = 








(M) 1 ^ 





CGCC UGCC UACC UAUC UAUU \ / CACC CAUC CAUU \ 
CGGC UGGC UAGC UAAC UAAU CAGC CAAC CAAU 

CGGG UGGG UAGG UAAG UAAA / \ CAGG CAAG CAAA / 



\\) ® (|,|) 1 = (2,1) 2 © (2.0) 1 © (1,1) 2 © (1.0) 1 



■2' 2' 



where 



(2,1) 2 = 

/ CCCG UCCG UUCG UUUG UUUA 

GCCG ACCG AUCG AUUG AUUA 

\ GGCG AGCG AACG A AUG AAUA 

(2,0)^ = 

( CGCG UGCG UACG UAUG UAUA ) 



(M) 2 = 
CUCG CUUG CUUA 

GUCG GUUG GUUA 

GACG GAUG GAUA 

(1,0)* = 

( CACG CAUG CAUA ) 



■1 i\ 

■ 2' 2' 



•3 1\2 
■ 2' 2/ 



(2,1) 3 © (2,0) 2 © (1,1) J 



;i,o) 2 



where 



(2, 1) 3 = 

CCGC UCGC UUGC UUAC UUAU 
GCGC ACGC AUGC AUAC AUAU 
GCGG ACGG AUGG AUAG AUAA 



(M) 3 = 
CUGC CUAC CUAU 

GUGC GUAC GUAU 

GUGG GUAG GUAA 



(2,0) 2 = 

( CCGG UCGG UUGG UUAG UUAA ) 



(1,0) 2 = 
( CUGG CUAG CUAA ) 



36 



% \) © (i f) 1 = (1, 2) 2 © (0, 2) 1 © (1, l) 4 © (0, l) 1 



where 



;i,2) s 



/ cccu 


uccu 


uucu \ 


GCCU 


ACCU 


AUCU 


GGCU 


AGCU 


AACU 


GGGU 


AGGU 


AAGU 


\ GGGA 


AGGA 


AAGA / 


/ CGCU 


UGCU 


UACU \ 


CGGU 


UGGU 


UAGU 


\ CGGA 


UGGA 


UAGA / 



(0.2) 1 



/ cucu \ 

GUCU 
GACU 
GAGU 
\ GAGA / 

CACU 
CAGU 
CAGA 



(|,i)©(i,|) 2 = (l,2) 3 ©(0,2) 2 ©(l,l) 5 ©(0,l) 2 

where 



/ ccuc 


ucuc 


ucuu ^ 




f ccuu \ 


GCUC 


ACUC 


ACUU 




GCUU 


! GGUC 


AGUC 


AGUU 


(0,2) 2 = 


GGUU 


GGAC 


AGAC 


AGAU 


GGAU 


\ GGAG 


AGAG 


AGAA j 




^ GGAA / 


/ CGUC 


UGUC 


UGUU \ 




/ CGUU \ 


CGAC 


UGAC 


UGAU 


(o,i) 2 = 


CGAU 


\ CGAG 


UGAG 


UGAA I 




\ CGAA / 



% \) © (i \f = (1, l) 6 © (1, 0) 3 © (0, l) 3 © (0, O) 1 



where 

CCCA UCCA UUCA \ / CUCA 

(1,1) 6 =| GCCA ACCA AUCA (0, l) 3 = GUCA 

GGCA AGCA AACA / \ GACA 

(1,0) 3 = ( CGCA UGCA UACA ) (O.O) 1 = ( CACA ) 



(i i)©(| i) 2 = (l,l) 7 ©(l,0) 4 ©(0,l) 4 ©(0,0) 2 



where 



CCGU UCGU UUGU 
GCGU ACGU AUGU 
GCGA ACGA AUGA 



(0,1)' 



CUGU 
GUGU 
GUGA 



(M) 7 = 

(1,0) 4 = ( CCGA UCGA UUGA ) (0,0) 2 = ( CUGA ) 
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(§, |) ® (i |) 3 = (1, l) 8 © (1, 0) 5 © (0, l) 5 © (0, 0) 3 



where 



CCUG UCUG UCUA 
I . I ) 8 - I GCUG ACUG ACUA 
GGUG AGUG AGUA 



(0,1) E 



CCUA 
GCUA 
GGUA 



(1,0) 5 = ( CGUG UGUG UGUA ) (0,0) 3 = ( CGUA ) 



(§, \) © (|, \f = (1, l) 9 © (1, 0) 6 © (0, l) 6 © (0, O) 4 

where 

/ CCAC UCAC UCAU \ / CCAU \ 

(1, l) 9 = GCAC ACAC ACAU (0, l) 6 = GCAU 

\ GCAG ACAG ACAA / \ GCAA / 

(1,0) 6 = ( CCAG UCAG UCAA ) (0,0) 4 = ( CCAA ) 
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